Skip to content

Comments

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171

Open
sbryngelson wants to merge 9 commits intoMFlowCode:masterfrom
sbryngelson:ci-test
Open

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171
sbryngelson wants to merge 9 commits intoMFlowCode:masterfrom
sbryngelson:ci-test

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 19, 2026

User description

Summary

  • Shard Frontier GPU tests into 2 parts for faster parallel execution
  • Add proactive ./mfc.sh clean in Phoenix test scripts to prevent cross-compiler contamination from stale build artifacts
  • Add --requeue to Phoenix SLURM jobs for preemption recovery
  • Add lint-gate job that must pass before self-hosted tests run
  • Add retry logic for GitHub runner tests (retry ≤5 failures)
  • Restructure self-hosted matrix with explicit cluster names

Depends on: #1170 (for monitor_slurm_job.sh and build script changes)

Test plan

  • Frontier GPU tests run in 2 shards and complete within 2h
  • Phoenix tests pass with proactive clean (no stale artifact errors)
  • Lint-gate blocks self-hosted tests on lint failure
  • GitHub runner retry logic fires on ≤5 test failures

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Test sharding across Frontier and Frontier AMD clusters; CI displays shard in job names.
    • Automatic retry logic for test runs with configurable attempts.
  • Tests

    • Persist failed test identifiers for retries and remove stale records on abort.
    • Shard-based test filtering to run only selected subsets.
  • Chores

    • Updated SLURM job defaults and added requeue behavior.
    • Workflow improvements: log archival, quoting fixes, matrix shard entries, and bot-review guard.

CodeAnt-AI Description

Shard GPU tests, add targeted retry and CI preemption/retry handling

What Changed

  • Tests can run a shard subset with --shard (e.g., "1/2"), and GitHub Actions now launches Frontier and Frontier (AMD) GPU tests as two parallel shards so each job runs only its portion of GPU tests.
  • Test runner writes failed test UUIDs to tests/failed_uuids.txt and CI retries only when 1–5 tests failed; CI avoids retrying when the file is empty or when too many failures indicate a real problem. The runner also removes stale failed_uuids.txt if tests abort.
  • SLURM submit scripts now pass shard info into test jobs; Frontier submit scripts use a shorter walltime, batch partition, and Hackathon QOS; Phoenix submit jobs auto-requeue on preemption. Builds on cluster runners get automatic retry with a clean step between attempts.
  • CI job scripts quote nproc calls to avoid shell issues and add a lint-gate step that runs before self-hosted tests.

Impact

✅ Faster GPU test completion via parallel shards
✅ Fewer flaky CI failures rerun (targeted retries for 1–5 failures)
✅ Fewer preemption-related test interruptions on Phoenix (auto-requeue)

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

- Shard Frontier GPU tests into 2 parts for faster parallel execution
- Add proactive ./mfc.sh clean in Phoenix test scripts to prevent
  cross-compiler contamination from stale build artifacts
- Add --requeue to Phoenix SLURM jobs for preemption recovery
- Add lint-gate job that must pass before self-hosted tests run
- Add retry logic for GitHub runner tests (retry <=5 failures)
- Add Frontier AMD test support with dedicated submit/test scripts
- Restructure self-hosted matrix with explicit cluster names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 19, 2026 20:01
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 19, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds test sharding across CI and cluster jobs, propagates shard arguments through GitHub Actions and SLURM submit/test scripts, updates SLURM resource/QOS settings for Frontier clusters, adds retry and failed-UUID handling, and introduces CLI/test-suite shard filtering and failure persistence.

Changes

Cohort / File(s) Summary
Frontier SLURM submit scripts
.github/workflows/frontier/submit.sh, .github/workflows/frontier_amd/submit.sh
Changed SLURM account to CFD154, reduced walltime to 01:59:00, switched partition to batch, added --qos=hackathon, and accept job_shard as a fourth positional argument.
Frontier test invocations
.github/workflows/frontier/test.sh, .github/workflows/frontier_amd/test.sh
Added shard_opts that conditionally sets --shard <value> when job_shard is present and appends it to GPU test command invocations.
Phoenix submit script
.github/workflows/phoenix/submit.sh
Added SBATCH --requeue option for automatic requeue on preemption.
GitHub Actions workflow
.github/workflows/test.yml
Added shard matrix entries (e.g., 1/2,2/2), include shard in job name, forward shard to test scripts, quote nproc usages, add retry logic around tests and build step, remove deprecated runner env overrides, and add Archive Logs step for non-Phoenix clusters.
CLI & test runner
toolchain/mfc/cli/commands.py, toolchain/mfc/test/test.py
Added --shard CLI option; implemented shard parsing/validation and test-case partitioning by shard; persist failed_uuids.txt on failures and clean it on abort or all-pass.
Workflow guard
.github/workflows/bench.yml
Added condition to skip Detect File Changes job when a pull_request_review is from a Bot.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions
  participant Runner as CI Runner
  participant Submit as submit.sh
  participant SLURM as Scheduler
  participant TestSh as mfc.sh / Test Runner
  participant Tests as Test Suite

  GH->>Runner: start job (matrix includes shard)
  Runner->>Submit: run submit.sh with shard arg
  Submit->>SLURM: sbatch (env: JOB_SHARD)
  SLURM->>TestSh: allocate nodes, start job
  TestSh->>TestSh: compute shard_opts from JOB_SHARD
  TestSh->>Tests: run tests with --shard (shard_opts)
  Tests->>TestSh: failed UUIDs (write failed_uuids.txt)
  TestSh->>GH: exit status, artifacts/logs (archive)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Review effort 3/5

Poem

🐰 I hopped through scripts both near and far,
Split tests in halves beneath the CI star,
Retries and shards now dance in tune,
Failed UUIDs saved—no lost buffoon! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely summarizes the three main changes: test sharding, proactive clean, and retry logic for self-hosted CI.
Description check ✅ Passed Description exceeds template requirements with detailed summaries from both user and AI perspectives, includes dependency reference, and provides comprehensive test plan, though some template checkboxes remain unchecked.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 19, 2026

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Heredoc variable interpolation / injection risk
    The sbatch job script is constructed by expanding variables (e.g. $sbatch_device_opts, $sbatch_script_contents) into an unquoted heredoc passed to sbatch.
    If sbatch_script_contents contains characters that should be preserved (dollars, backticks, multiline constructs) they will be expanded at submit time rather than inside the job script, which can produce unexpected behavior or cause accidental expansion/injection. Ensure values are properly escaped or use a quoted heredoc so the intended content is delivered to the job.

  • Unvalidated shard
    job_shard is taken directly from $4 and passed into downstream scripts and commands (via --shard in test scripts) without sanitization. A malformed or malicious value could cause unexpected behavior. At minimum validate it's numeric and in the expected shard range (e.g. 1..2).

  • Queue / account assumptions
    The SBATCH header now hardcodes account, partition and QoS (CFD154, batch, hackathon) and reduces walltime. These changes may be restricted or cause job rejections depending on cluster policy; they should be configurable or approved by cluster admins.

  • Build verification
    The retry loop uses a dry-run invocation ("--dry-run") to decide whether the build succeeded.
    A dry-run may not exercise the real build/link steps (and can return success while the real build fails),
    so succeeding the dry-run does not guarantee the following actual test run will succeed. Consider running a real build or verifying artifacts instead of only a dry-run.

  • GPU detection fragility
    GPU detection uses nvidia-smi -L | wc -l and then feeds that into seq and expr.
    If nvidia-smi is not present or returns an error, gpu_count can be empty or non-numeric which will cause seq/expr to fail and break the script. Add robust fallbacks and validation.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 19, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the self-hosted CI infrastructure with test sharding, proactive cleanup, and retry mechanisms to improve reliability and reduce execution time. It addresses cross-compiler contamination issues on persistent runners and enables faster parallel test execution on batch partition systems.

Changes:

  • Add retry logic for GitHub runner tests (≤5 failures trigger automatic retest)
  • Shard Frontier GPU tests into 2 parallel jobs for faster execution
  • Add proactive ./mfc.sh clean to Phoenix test scripts
  • Add --requeue flag to Phoenix SLURM jobs for preemption recovery
  • Wrap Frontier build steps in retry action with automatic cleanup
  • Update Frontier SLURM configuration (account, partition, timeout, QOS)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/test.yml Add retry logic for ≤5 test failures, add shard parameter to matrix, wrap builds in retry action, remove deprecated environment variables
.github/workflows/phoenix/test.sh Add proactive ./mfc.sh clean to prevent cross-compiler contamination
.github/workflows/phoenix/submit.sh Add --requeue flag for automatic preemption recovery
.github/workflows/frontier/test.sh Add shard parameter handling for test splitting
.github/workflows/frontier/submit.sh Update SLURM config (account, partition, timeout, QOS) and add shard parameter
.github/workflows/frontier_amd/test.sh Add shard parameter handling for test splitting
.github/workflows/frontier_amd/submit.sh Update SLURM config (account, partition, timeout, QOS) and add shard parameter

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/test.yml (1)

265-274: ⚠️ Potential issue | 🟡 Minor

Log file references don't account for shard — will break if job_slug is fixed.

test-${{ matrix.device }}-${{ matrix.interface }}.out on line 267 assumes the output filename doesn't include a shard suffix. This is currently consistent with the submit scripts, but if the job_slug collision (flagged on frontier_amd/submit.sh) is fixed by incorporating the shard, these references must be updated in tandem.

Also, the artifact name on line 273 doesn't include shard, which could cause upload conflicts for sharded matrix entries with the same device/interface (e.g., two gpu-acc frontier shards). strategy.job-index makes it unique, but adding shard would improve clarity.

Proposed fix (apply after fixing job_slug in submit scripts)
      - name: Print Logs
        if:   always()
-        run:  cat test-${{ matrix.device }}-${{ matrix.interface }}.out
+        run:  cat test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}.out

      - name: Archive Logs
        uses: actions/upload-artifact@v4
        if:   matrix.cluster != 'phoenix'
        with:
-          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}
+          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}
-          path: test-${{ matrix.device }}-${{ matrix.interface }}.out
+          path: test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}.out

Note: The shard value contains / (e.g., 1/2) which is invalid in filenames. The submit script slug sanitization would need to handle this (e.g., replace / with -of-), and the workflow expressions here would need to match.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test.yml around lines 265 - 274, Update the Print Logs and
Archive Logs steps so the logfile and artifact name include the shard-aware slug
used by the submit scripts (instead of assuming test-${{ matrix.device }}-${{
matrix.interface }}.out). Locate the "Print Logs" and "Archive Logs" steps and
change the referenced filename and artifact name to incorporate the sanitized
job slug/shard token produced by the submit scripts (the same slug that replaces
"/" with a safe separator such as "-of-"); ensure the workflow expression that
builds the filename and the artifact "name" use that sanitized slug so filenames
and artifact names remain unique and valid across sharded jobs.
.github/workflows/frontier_amd/submit.sh (1)

31-32: ⚠️ Potential issue | 🔴 Critical

Job slug does not include shard — SLURM output files collide when sharded tests run concurrently.

When multiple shards for the same device/interface pair run on the same HPC cluster, they produce identical job_slug values (e.g., test-gpu-acc for both shard 1/2 and 2/2), resulting in identical output_file names. Since both SLURM jobs execute from the same SLURM_SUBMIT_DIR, one job's output will silently overwrite the other's. This affects both .github/workflows/frontier/submit.sh and .github/workflows/frontier_amd/submit.sh at line 31.

Incorporate the shard into the slug:

Proposed fix
-job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"
+shard_suffix=""
+if [ -n "$4" ]; then
+    shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
+fi
+job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"

Additionally, update .github/workflows/test.yml line 267 and 273 to account for the shard suffix:

  • Line 267: cat test-${{ matrix.device }}-${{ matrix.interface }}.outcat test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' | replace('/', '-of-') }}.out
  • Line 273 artifact name: include shard suffix to match

The usage messages in both scripts (line 9) should also be updated to document the interface and shard parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/frontier_amd/submit.sh around lines 31 - 32, The job_slug
currently built by job_slug and used for output_file omits the shard, causing
name collisions; update the job_slug generation (the job_slug variable and any
references to output_file) to append the shard identifier (formatting the shard
like "-{shard}" and replacing "/" with "-of-" for values like "1/2") so each
shard produces a unique slug; also update the script usage message (the usage
text near the top that lists parameters) to document the interface and shard
parameters, and update the workflow steps that read and upload artifacts (the
cat command that reads test-${matrix.device}-${matrix.interface}.out and the
artifact name) to include the same shard suffix formatting so artifact names and
printed output match the new job_slug convention.
🧹 Nitpick comments (1)
.github/workflows/frontier_amd/submit.sh (1)

8-9: Usage message is outdated — does not document the interface or shard arguments.

The script accepts up to 4 positional arguments ($1=script, $2=device, $3=interface, $4=shard), but the usage string only mentions the first two.

Proposed fix
 usage() {
-    echo "Usage: $0 [script.sh] [cpu|gpu]"
+    echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp] [shard]"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/frontier_amd/submit.sh around lines 8 - 9, The usage()
function's message is outdated and only mentions two arguments; update the echo
in usage() to document all supported positional params ($1 script, $2 device
(cpu|gpu), $3 interface, $4 shard) and any defaults or optional markers (e.g.,
"[interface]" "[shard]") so callers see the full signature; edit the echo inside
usage() to a single clear line listing script, device, interface, and shard and
optional/default semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.github/workflows/frontier_amd/submit.sh:
- Around line 31-32: The job_slug currently built by job_slug and used for
output_file omits the shard, causing name collisions; update the job_slug
generation (the job_slug variable and any references to output_file) to append
the shard identifier (formatting the shard like "-{shard}" and replacing "/"
with "-of-" for values like "1/2") so each shard produces a unique slug; also
update the script usage message (the usage text near the top that lists
parameters) to document the interface and shard parameters, and update the
workflow steps that read and upload artifacts (the cat command that reads
test-${matrix.device}-${matrix.interface}.out and the artifact name) to include
the same shard suffix formatting so artifact names and printed output match the
new job_slug convention.

In @.github/workflows/test.yml:
- Around line 265-274: Update the Print Logs and Archive Logs steps so the
logfile and artifact name include the shard-aware slug used by the submit
scripts (instead of assuming test-${{ matrix.device }}-${{ matrix.interface
}}.out). Locate the "Print Logs" and "Archive Logs" steps and change the
referenced filename and artifact name to incorporate the sanitized job
slug/shard token produced by the submit scripts (the same slug that replaces "/"
with a safe separator such as "-of-"); ensure the workflow expression that
builds the filename and the artifact "name" use that sanitized slug so filenames
and artifact names remain unique and valid across sharded jobs.

---

Duplicate comments:
In @.github/workflows/frontier/submit.sh:
- Around line 31-32: job_slug and output_file are colliding for parallel shards
because they only use basename("$1") with $2 and $3; update the job_slug
generation (and derived output_file) to include an additional unique shard
identifier (for example a shard index/ID passed as another script argument or a
runtime value like the process/array task id) so each shard produces a distinct
job_slug and output_file; change the construction that sets job_slug and the
assignment of output_file to append that unique identifier.

---

Nitpick comments:
In @.github/workflows/frontier_amd/submit.sh:
- Around line 8-9: The usage() function's message is outdated and only mentions
two arguments; update the echo in usage() to document all supported positional
params ($1 script, $2 device (cpu|gpu), $3 interface, $4 shard) and any defaults
or optional markers (e.g., "[interface]" "[shard]") so callers see the full
signature; edit the echo inside usage() to a single clear line listing script,
device, interface, and shard and optional/default semantics.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files

Confidence score: 4/5

  • Moderate risk only: the cleanup step in .github/workflows/phoenix/test.sh doesn’t check the ./mfc.sh clean exit status, so failures could allow stale artifacts to affect builds/tests.
  • This is a CI reliability concern rather than a direct product bug, so it’s likely safe to merge with minimal risk.
  • Pay close attention to .github/workflows/phoenix/test.sh - ensure cleanup failures don’t silently proceed.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/phoenix/test.sh">

<violation number="1" location=".github/workflows/phoenix/test.sh:5">
P2: The `./mfc.sh clean` exit status is not checked. If the clean fails, the script continues and may build/test against stale or corrupted artifacts, defeating the purpose of this proactive cleanup and causing hard-to-diagnose failures.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

The CI test scripts use --shard for splitting Frontier GPU tests across
multiple jobs, and failed_uuids.txt for retry logic. These toolchain
changes were missing from the cherry-pick.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
toolchain/mfc/test/test.py (1)

102-108: Shard filtering is correct — minor readability nit on line 104.

The validation logic handles all edge cases correctly (short-circuit or ensures int() is never called on non-digit strings), and i % shard_count == shard_idx - 1 correctly partitions cases without overlap. The placement after all other filters and before --percent is the right ordering.

Optional: the compound condition on line 104 can be split into guard clauses to improve readability:

♻️ Optional readability refactor
-        if len(parts) != 2 or not all(p.isdigit() for p in parts) or int(parts[1]) < 1 or not 1 <= int(parts[0]) <= int(parts[1]):
-            raise MFCException(f"Invalid --shard '{ARG('shard')}': expected 'i/n' with 1 <= i <= n (e.g., '1/2').")
+        def _bad_shard():
+            if len(parts) != 2 or not all(p.isdigit() for p in parts):
+                return True
+            n, i = int(parts[1]), int(parts[0])
+            return n < 1 or not (1 <= i <= n)
+        if _bad_shard():
+            raise MFCException(f"Invalid --shard '{ARG('shard')}': expected 'i/n' with 1 <= i <= n (e.g., '1/2').")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@toolchain/mfc/test/test.py` around lines 102 - 108, The compound validation
in the ARG("shard") block is correct but hard to read; refactor the conditional
inside the if ARG("shard") is not None: block by splitting the long compound
condition into explicit guard checks: first split = ARG("shard").split("/") and
verify length == 2, then check that both parts are digits (using
parts[0].isdigit() and parts[1].isdigit()), then parse shard_idx = int(parts[0])
and shard_count = int(parts[1]) and validate shard_count >= 1 and 1 <= shard_idx
<= shard_count; on any failure raise MFCException with the same message, then
compute skipped_cases and cases using shard_idx and shard_count as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@toolchain/mfc/test/test.py`:
- Around line 217-224: When abort_tests.is_set() causes test() to raise
MFCException the existing cleanup that writes/removes failed_uuids.txt (the
failed_uuids_path handling around failed_tests, open(...), os.remove(...)) is
skipped; modify the exception/exit path to always attempt to remove stale
failed_uuids_path and wrap file I/O (open and os.remove) in try/except catching
OSError (or Exception) so I/O errors are logged but do not replace the real exit
code—i.e., in the MFCException handler and/or finally block ensure you try to
delete failed_uuids_path if it exists and handle/log any OSError from
open()/os.remove() instead of letting it propagate.

---

Nitpick comments:
In `@toolchain/mfc/test/test.py`:
- Around line 102-108: The compound validation in the ARG("shard") block is
correct but hard to read; refactor the conditional inside the if ARG("shard") is
not None: block by splitting the long compound condition into explicit guard
checks: first split = ARG("shard").split("/") and verify length == 2, then check
that both parts are digits (using parts[0].isdigit() and parts[1].isdigit()),
then parse shard_idx = int(parts[0]) and shard_count = int(parts[1]) and
validate shard_count >= 1 and 1 <= shard_idx <= shard_count; on any failure
raise MFCException with the same message, then compute skipped_cases and cases
using shard_idx and shard_count as before.

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.05%. Comparing base (84c46e0) to head (2ff6301).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1171   +/-   ##
=======================================
  Coverage   44.05%   44.05%           
=======================================
  Files          70       70           
  Lines       20498    20498           
  Branches     1990     1990           
=======================================
  Hits         9030     9030           
  Misses      10329    10329           
  Partials     1139     1139           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 20, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:M This PR changes 30-99 lines, ignoring generated files labels Feb 20, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 20, 2026

CodeAnt AI Incremental review completed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test.yml:
- Around line 146-166: The workflow is triggering shellcheck/ actionlint SC2046
on unquoted command substitution for nproc; update both invocations of -j
$(nproc) in the test run commands (the two calls to "/bin/bash mfc.sh test -v
... -j $(nproc) ..." and the retry call) to quote the substitution as -j
"$(nproc)"; leave $TEST_ALL, $TEST_PCT and the intentionally unquoted $FAILED
as-is and ensure the surrounding logic using TEST_EXIT, failed_uuids.txt and the
retry branch remains unchanged.

sbryngelson and others added 2 commits February 20, 2026 14:44
- Clean up failed_uuids.txt on early abort path so CI doesn't retry
  stale UUIDs from a previous run
- Guard retry condition with NUM_FAILED > 0 to prevent full-suite
  rerun when the file exists but is empty
- Quote $(nproc) to silence shellcheck SC2046 warnings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The build system should handle compiler changes correctly. Proactive
clean forces full rebuilds of FFTW/LAPACK from scratch every run,
which is slow and exposes builds to transient filesystem failures
(CMake TryCompile errors on Phoenix scratch).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
toolchain/mfc/test/test.py (1)

104-104: Prefer str.isdecimal() over str.isdigit() for the digit guard.

str.isdigit() returns True for Unicode super/subscript characters like "²", but int("²") raises a ValueError. This creates a gap: a value like "²/2" passes the guard but crashes on int(parts[0]). str.isdecimal() limits acceptance to [0-9], closing the gap.

♻️ Proposed fix
-        if len(parts) != 2 or not all(p.isdigit() for p in parts) or int(parts[1]) < 1 or not 1 <= int(parts[0]) <= int(parts[1]):
+        if len(parts) != 2 or not all(p.isdecimal() for p in parts) or int(parts[1]) < 1 or not 1 <= int(parts[0]) <= int(parts[1]):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@toolchain/mfc/test/test.py` at line 104, The guard currently uses
str.isdigit() on elements of parts in the conditional (the expression containing
"len(parts) != 2 or not all(p.isdigit() for p in parts) ..."), which allows
non-ASCII numeric characters that int() can't parse; update the predicate to use
str.isdecimal() instead (i.e., replace the all(p.isdigit() for p in parts) check
with all(p.isdecimal() for p in parts)) so only ASCII digits [0-9] are accepted
before calling int() on parts[0] and parts[1].
.github/workflows/test.yml (2)

277-282: Artifact name omits the shard — consider adding it for easier log identification.

Currently logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }} is unique across matrix entries (thanks to strategy.job-index), but the shard isn't visible in the artifact name. When debugging a sharded run it's helpful to know at a glance whether the log came from shard 1/2 or 2/2. A minor ergonomics improvement:

♻️ Proposed name including shard
-          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}
+          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-shard-{0}', matrix.shard) || '' }}

Note: GitHub artifact names cannot contain /, so you'll need to sanitize the shard value (e.g., replace / with -):

+          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-shard-{0}', replace(matrix.shard, '/', '-')) || '' }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test.yml around lines 277 - 282, The artifact name for the
"Archive Logs" step currently uses logs-${{ strategy.job-index }}-${{
matrix.device }}-${{ matrix.interface }}, which omits the shard; update the
Upload Artifact step (name: Archive Logs, uses: actions/upload-artifact@v4) to
append the sanitized shard value from matrix.shard (use the replace expression
to substitute '/' with '-' e.g. replace(matrix.shard, '/', '-')) so the artifact
name becomes logs-${{ strategy.job-index }}-${{ matrix.device }}-${{
matrix.interface }}-<sanitized shard>, ensuring shard information is visible and
safe for GitHub artifact naming.

146-166: Useless use of cat on line 155.

FAILED=$(cat tests/failed_uuids.txt | tr '\n' ' ') forks an extra process unnecessarily.

♻️ Proposed fix
-              FAILED=$(cat tests/failed_uuids.txt | tr '\n' ' ')
+              FAILED=$(tr '\n' ' ' < tests/failed_uuids.txt)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test.yml around lines 146 - 166, The assignment to FAILED
currently uses a useless cat pipeline ("FAILED=$(cat tests/failed_uuids.txt | tr
'\n' ' ')") which spawns an extra process; change it to read the file directly
into tr or use shell builtins to avoid forking (for example use input
redirection with tr like "FAILED=$(tr '\n' ' ' < tests/failed_uuids.txt)" or use
readarray/mapfile to populate and join), leaving the surrounding logic
(NUM_FAILED, retry invocation of mfc.sh, and TEST_EXIT handling) untouched and
still referencing tests/failed_uuids.txt and the FAILED variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@toolchain/mfc/test/test.py`:
- Around line 193-196: The unguarded os.remove(failed_uuids_path) can raise
OSError and mask the original MFCException on the abort path; wrap the call to
os.remove (referring to failed_uuids_path) in a try/except OSError that either
logs the exception or quietly ignores it (consistent with the I/O block pattern
used elsewhere) so the original MFCException from the abort flow is preserved.

---

Duplicate comments:
In @.github/workflows/test.yml:
- Line 31: The quoting issue for the CPU count in the workflow step has already
been fixed; no code change is required—leave the step calling the script as
"./mfc.sh format -j \"$(nproc)\"" in the GitHub Actions workflow to preserve
correct quoting and shell expansion.

In `@toolchain/mfc/test/test.py`:
- Around line 222-229: The code writing/removing failed_uuids_path (using
failed_uuids_path, failed_tests, open(), os.remove()) is unguarded and can raise
I/O errors that would replace the intended exit(nFAIL); wrap the write and the
remove branches in try/except blocks that catch Exception, log or print the
error with context (including the path and exception), and ensure the script
still calls exit(nFAIL) after handling the I/O error so the test failure status
is preserved.

---

Nitpick comments:
In @.github/workflows/test.yml:
- Around line 277-282: The artifact name for the "Archive Logs" step currently
uses logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface
}}, which omits the shard; update the Upload Artifact step (name: Archive Logs,
uses: actions/upload-artifact@v4) to append the sanitized shard value from
matrix.shard (use the replace expression to substitute '/' with '-' e.g.
replace(matrix.shard, '/', '-')) so the artifact name becomes logs-${{
strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}-<sanitized
shard>, ensuring shard information is visible and safe for GitHub artifact
naming.
- Around line 146-166: The assignment to FAILED currently uses a useless cat
pipeline ("FAILED=$(cat tests/failed_uuids.txt | tr '\n' ' ')") which spawns an
extra process; change it to read the file directly into tr or use shell builtins
to avoid forking (for example use input redirection with tr like "FAILED=$(tr
'\n' ' ' < tests/failed_uuids.txt)" or use readarray/mapfile to populate and
join), leaving the surrounding logic (NUM_FAILED, retry invocation of mfc.sh,
and TEST_EXIT handling) untouched and still referencing tests/failed_uuids.txt
and the FAILED variable.

In `@toolchain/mfc/test/test.py`:
- Line 104: The guard currently uses str.isdigit() on elements of parts in the
conditional (the expression containing "len(parts) != 2 or not all(p.isdigit()
for p in parts) ..."), which allows non-ASCII numeric characters that int()
can't parse; update the predicate to use str.isdecimal() instead (i.e., replace
the all(p.isdigit() for p in parts) check with all(p.isdecimal() for p in
parts)) so only ASCII digits [0-9] are accepted before calling int() on parts[0]
and parts[1].

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 21, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 21, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 21, 2026

CodeAnt AI Incremental review completed.

sbryngelson and others added 3 commits February 20, 2026 22:20
Bot reviews (AI code reviewers) were triggering the benchmark workflow,
and the concurrency group was cancelling the real benchmark run from
the pull_request event. Gate the workflow early by skipping when the
review author is a Bot account type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ty file check

- Wrap os.remove() in try/except OSError on abort path so permission errors
  don't mask the real MFCException
- Only pass --precision flag when matrix.precision is non-empty to avoid
  invalid bare -- argument
- Use -s instead of -f for failed_uuids.txt to skip retry when file exists
  but is empty

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant